Midterm Project: Strawberries and Chemicals

Mi Zhang, Peng Liu, Yuanming Leng, Qiannan Shen

Data Cleaning for Strawberries

  • remove empty/missing values and reduce white space in the cells
  • split column with multiple items to separated columns
  • redefine “MEASURED IN CWT” by multiplying by 100
  • make extreme large value more accessible by using log scale on “value”
  • redefine value as production of strawberries

Measurement units and value

  • Our major analysis is base on the measurement units, and our main focus in on “MEASURE IN LB”
GGally::ggpairs(strawb1, columns=c(3,6), aes(color=strawb1[,3], alpha = 0.5), lower=list(combo=wrap("facethist",binwidth=0.5)))

Map

  • California and Florida are two states where majority data are collected from.

yearly value of strawberries in each state

  • Showing that California Florida increasely used all kinds of chemicals on strawberries.
plot1("MEASURED IN LB")

Data wrangling for Strawberries and Pesticides

  • drop empty rows/columns, remove white space
  • rename colname of Pesticide to chemical in order to match the colname in strawberries data
  • use toupper() to capitalize all chemical names
  • use pivot_longer() to make all toxins and levels into one column
  • use inner_join() to wrangle Pesticide and Strawberry dataset

Toxin level versus Value

  • Bee toxin related chemicals are proportioned to larger strawberries production values

Total toxin levels

  • In both human and bee toxins, insecticide chemicals have more observations than fungicide.
p4 <- plot4("MEASURED IN LB","Bee.Toxins","CALIFORNIA")
ggplotly(p4, tooltip="y")

Bee Toxin

  • But looking solely at bee toxins, fungicide chemicals have higher proportion.
p5 <- plot5("MEASURED IN LB","Bee.Toxins","CALIFORNIA","slight")
ggplotly(p5, tooltip="y")

Conclusion

Thanks

Citations